28 research outputs found

    Sublinear algorithms for Earth Mover's Distance

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Includes bibliographical references (p. 14-15).We study the problem of estimating the Earth Mover's Distance (EMD) between probability distributions when given access only to samples. We give closeness testers and additive-error estimators over domains in [0, [delta]]d, with sample complexities independent of domain size - permitting the testability even of continuous distributions over infinite domains. Instead, our algorithms depend on other parameters, such as the diameter of the domain space, which may be significantly smaller. We also prove lower bounds showing our testers to be optimal in their dependence on these parameters. Additionally, we consider whether natural classes of distributions exist for which there are algorithms with better dependence on the dimension, and show that for highly clusterable data, this is indeed the case. Lastly, we consider a variant of the EMD, defined over tree metrics instead of the usual L₁ metric, and give optimal algorithms.by Khanh Do Ba.S.M

    Wait-Free and Obstruction-Free Snapshot

    Get PDF
    The snapshot problem was first proposed over a decade ago and has since been well-studied in the distributed algorithms community. The challenge is to design a data structure consisting of mm components, shared by upto nn concurrent processes, that supports two operations. The first, Update(i,v)Update(i,v), atomically writes vv to the iith component. The second, Scan()Scan(), returns an atomic snapshot of all mm components. We consider two termination properties: wait-freedom, which requires a process to always terminate in a bounded number of its own steps, and the weaker obstruction-freedom, which requires such termination only for processes that eventually execute uninterrupted. First, we present a simple, time and space optimal, obstruction-free solution to the single-writer, multi-scanner version of the snapshot problem (wherein concurrent Updates never occur on the same component). Second, we assume hardware support for compare&swap (CAS) to give a time-optimal, wait-free solution to the multi-writer, single-scanner snapshot problem (wherein concurrent Scans never occur). This algorithm uses only O(mn)O(mn) space and has optimal CAS, write and remote-reference complexities. Additionally, it can be augmented to implement a general snapshot object with the same time and space bounds, thus improving the space complexity of O(mn2)O(mn^2) of the only previously known time-optimal solution

    Lower Bounds for Sparse Recovery

    Get PDF
    We consider the following k-sparse recovery problem: design an m x n matrix A, such that for any signal x, given Ax we can efficiently recover x' satisfying ||x-x'||_1 <= C min_{k-sparse} x"} ||x-x"||_1. It is known that there exist matrices A with this property that have only O(k log (n/k)) rows. In this paper we show that this bound is tight. Our bound holds even for the more general /randomized/ version of the problem, where A is a random variable and the recovery algorithm is required to work for any fixed x with constant probability (over A).Comment: 11 pages. Appeared at SODA 201

    Block Heavy Hitters

    Get PDF
    e study a natural generalization of the heavy hitters problem in thestreaming context. We term this generalization *block heavy hitters* and define it as follows. We are to stream over a matrixAA, and report all *rows* that are heavy, where a row is heavy ifits ell_1-norm is at least phi fraction of the ell_1 norm ofthe entire matrix AA. In comparison, in the standard heavy hittersproblem, we are required to report the matrix *entries* that areheavy. As is common in streaming, we solve the problem approximately:we return all rows with weight at least phi, but also possibly someother rows that have weight no less than (1-eps)phi. To solve theblock heavy hitters problem, we show how to construct a linear sketchof A from which we can recover the heavy rows of A.The block heavy hitters problem has already found applications forother streaming problems. In particular, it is a crucial buildingblock in a streaming algorithm that constructs asmall-size sketch for the Ulam metric, a metric on non-repetitivestrings under the edit (Levenshtein) distance

    Lower bounds for sparse recovery

    Get PDF
    We consider the following k-sparse recovery problem: design an m x n matrix A, such that for any signal x, given Ax we can efficiently recover ^x satisfying x|| ^x||1 [less than or equal to] C min[subscript k]-sparse x'||x - x'||1. It is known that there exist matrices A with this property that have only O(k log(n=k)) rows. In this paper we show that this bound is tight. Our bound holds even for the more general random- ized version of the problem, where A is a random variable, and the recovery algorithm is required to work for any fixed x with constant probability (over A).David & Lucile Packard FoundationDanish National Research FoundationDanish National Research Foundation (MADALGO (Center for Massive Data Algorithmics))National Science Foundation (U.S.) (grant CCF-0728645)Cisco Community Fellowship Progra

    Sublinear time algorithms for earth mover's distance

    Get PDF
    We study the problem of estimating the Earth Mover’s Distance (EMD) between probability distributions when given access only to samples of the distributions. We give closeness testers and additive-error estimators over domains in [0, 1][superscript d], with sample complexities independent of domain size – permitting the testability even of continuous distributions over infinite domains. Instead, our algorithms depend on other parameters, such as the diameter of the domain space, which may be significantly smaller. We also prove lower bounds showing the dependencies on these parameters to be essentially optimal. Additionally, we consider whether natural classes of distributions exist for which there are algorithms with better dependence on the dimension, and show that for highly clusterable data, this is indeed the case. Lastly, we consider a variant of the EMD, defined over tree metrics instead of the usual l 1 metric, and give tight upper and lower bounds

    Erratum: Global, regional, and national comparative risk assessment of 84 behavioural, environmental and occupational, and metabolic risks or clusters of risks for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017

    Get PDF
    Interpretation: By quantifying levels and trends in exposures to risk factors and the resulting disease burden, this assessment offers insight into where past policy and programme efforts might have been successful and highlights current priorities for public health action. Decreases in behavioural, environmental, and occupational risks have largely offset the effects of population growth and ageing, in relation to trends in absolute burden. Conversely, the combination of increasing metabolic risks and population ageing will probably continue to drive the increasing trends in non-communicable diseases at the global level, which presents both a public health challenge and opportunity. We see considerable spatiotemporal heterogeneity in levels of risk exposure and risk-attributable burden. Although levels of development underlie some of this heterogeneity, O/E ratios show risks for which countries are overperforming or underperforming relative to their level of development. As such, these ratios provide a benchmarking tool to help to focus local decision making. Our findings reinforce the importance of both risk exposure monitoring and epidemiological research to assess causal connections between risks and health outcomes, and they highlight the usefulness of the GBD study in synthesising data to draw comprehensive and robust conclusions that help to inform good policy and strategic health planning

    Global, regional, and national comparative risk assessment of 84 behavioural, environmental and occupational, and metabolic risks or clusters of risks for 195 countries and territories, 1990-2017: a systematic analysis for the Global Burden of Disease Study 2017

    Get PDF
    Background The Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2017 comparative risk assessment (CRA) is a comprehensive approach to risk factor quantification that offers a useful tool for synthesising evidence on risks and risk–outcome associations. With each annual GBD study, we update the GBD CRA to incorporate improved methods, new risks and risk–outcome pairs, and new data on risk exposure levels and risk–outcome associations. Methods We used the CRA framework developed for previous iterations of GBD to estimate levels and trends in exposure, attributable deaths, and attributable disability-adjusted life-years (DALYs), by age group, sex, year, and location for 84 behavioural, environmental and occupational, and metabolic risks or groups of risks from 1990 to 2017. This study included 476 risk–outcome pairs that met the GBD study criteria for convincing or probable evidence of causation. We extracted relative risk and exposure estimates from 46 749 randomised controlled trials, cohort studies, household surveys, census data, satellite data, and other sources. We used statistical models to pool data, adjust for bias, and incorporate covariates. Using the counterfactual scenario of theoretical minimum risk exposure level (TMREL), we estimated the portion of deaths and DALYs that could be attributed to a given risk. We explored the relationship between development and risk exposure by modelling the relationship between the Socio-demographic Index (SDI) and risk-weighted exposure prevalence and estimated expected levels of exposure and risk-attributable burden by SDI. Finally, we explored temporal changes in risk-attributable DALYs by decomposing those changes into six main component drivers of change as follows: (1) population growth; (2) changes in population age structures; (3) changes in exposure to environmental and occupational risks; (4) changes in exposure to behavioural risks; (5) changes in exposure to metabolic risks; and (6) changes due to all other factors, approximated as the risk-deleted death and DALY rates, where the risk-deleted rate is the rate that would be observed had we reduced the exposure levels to the TMREL for all risk factors included in GBD 2017. Findings In 2017, 34·1 million (95% uncertainty interval [UI] 33·3–35·0) deaths and 1·21 billion (1·14–1·28) DALYs were attributable to GBD risk factors. Globally, 61·0% (59·6–62·4) of deaths and 48·3% (46·3–50·2) of DALYs were attributed to the GBD 2017 risk factors. When ranked by risk-attributable DALYs, high systolic blood pressure (SBP) was the leading risk factor, accounting for 10·4 million (9·39–11·5) deaths and 218 million (198–237) DALYs, followed by smoking (7·10 million [6·83–7·37] deaths and 182 million [173–193] DALYs), high fasting plasma glucose (6·53 million [5·23–8·23] deaths and 171 million [144–201] DALYs), high body-mass index (BMI; 4·72 million [2·99–6·70] deaths and 148 million [98·6–202] DALYs), and short gestation for birthweight (1·43 million [1·36–1·51] deaths and 139 million [131–147] DALYs). In total, risk-attributable DALYs declined by 4·9% (3·3–6·5) between 2007 and 2017. In the absence of demographic changes (ie, population growth and ageing), changes in risk exposure and risk-deleted DALYs would have led to a 23·5% decline in DALYs during that period. Conversely, in the absence of changes in risk exposure and risk-deleted DALYs, demographic changes would have led to an 18·6% increase in DALYs during that period. The ratios of observed risk exposure levels to exposure levels expected based on SDI (O/E ratios) increased globally for unsafe drinking water and household air pollution between 1990 and 2017. This result suggests that development is occurring more rapidly than are changes in the underlying risk structure in a population. Conversely, nearly universal declines in O/E ratios for smoking and alcohol use indicate that, for a given SDI, exposure to these risks is declining. In 2017, the leading Level 4 risk factor for age-standardised DALY rates was high SBP in four super-regions: central Europe, eastern Europe, and central Asia; north Africa and Middle East; south Asia; and southeast Asia, east Asia, and Oceania. The leading risk factor in the high-income super-region was smoking, in Latin America and Caribbean was high BMI, and in sub-Saharan Africa was unsafe sex. O/E ratios for unsafe sex in sub-Saharan Africa were notably high, and those for alcohol use in north Africa and the Middle East were notably low. Interpretation By quantifying levels and trends in exposures to risk factors and the resulting disease burden, this assessment offers insight into where past policy and programme efforts might have been successful and highlights current priorities for public health action. Decreases in behavioural, environmental, and occupational risks have largely offset the effects of population growth and ageing, in relation to trends in absolute burden. Conversely, the combination of increasing metabolic risks and population ageing will probably continue to drive the increasing trends in non-communicable diseases at the global level, which presents both a public health challenge and opportunity. We see considerable spatiotemporal heterogeneity in levels of risk exposure and risk-attributable burden. Although levels of development underlie some of this heterogeneity, O/E ratios show risks for which countries are overperforming or underperforming relative to their level of development. As such, these ratios provide a benchmarking tool to help to focus local decision making. Our findings reinforce the importance of both risk exposure monitoring and epidemiological research to assess causal connections between risks and health outcomes, and they highlight the usefulness of the GBD study in synthesising data to draw comprehensive and robust conclusions that help to inform good policy and strategic health planning

    Safety and efficacy of fluoxetine on functional outcome after acute stroke (AFFINITY): a randomised, double-blind, placebo-controlled trial

    Get PDF
    Background Trials of fluoxetine for recovery after stroke report conflicting results. The Assessment oF FluoxetINe In sTroke recoverY (AFFINITY) trial aimed to show if daily oral fluoxetine for 6 months after stroke improves functional outcome in an ethnically diverse population. Methods AFFINITY was a randomised, parallel-group, double-blind, placebo-controlled trial done in 43 hospital stroke units in Australia (n=29), New Zealand (four), and Vietnam (ten). Eligible patients were adults (aged ≥18 years) with a clinical diagnosis of acute stroke in the previous 2–15 days, brain imaging consistent with ischaemic or haemorrhagic stroke, and a persisting neurological deficit that produced a modified Rankin Scale (mRS) score of 1 or more. Patients were randomly assigned 1:1 via a web-based system using a minimisation algorithm to once daily, oral fluoxetine 20 mg capsules or matching placebo for 6 months. Patients, carers, investigators, and outcome assessors were masked to the treatment allocation. The primary outcome was functional status, measured by the mRS, at 6 months. The primary analysis was an ordinal logistic regression of the mRS at 6 months, adjusted for minimisation variables. Primary and safety analyses were done according to the patient's treatment allocation. The trial is registered with the Australian New Zealand Clinical Trials Registry, ACTRN12611000774921. Findings Between Jan 11, 2013, and June 30, 2019, 1280 patients were recruited in Australia (n=532), New Zealand (n=42), and Vietnam (n=706), of whom 642 were randomly assigned to fluoxetine and 638 were randomly assigned to placebo. Mean duration of trial treatment was 167 days (SD 48·1). At 6 months, mRS data were available in 624 (97%) patients in the fluoxetine group and 632 (99%) in the placebo group. The distribution of mRS categories was similar in the fluoxetine and placebo groups (adjusted common odds ratio 0·94, 95% CI 0·76–1·15; p=0·53). Compared with patients in the placebo group, patients in the fluoxetine group had more falls (20 [3%] vs seven [1%]; p=0·018), bone fractures (19 [3%] vs six [1%]; p=0·014), and epileptic seizures (ten [2%] vs two [<1%]; p=0·038) at 6 months. Interpretation Oral fluoxetine 20 mg daily for 6 months after acute stroke did not improve functional outcome and increased the risk of falls, bone fractures, and epileptic seizures. These results do not support the use of fluoxetine to improve functional outcome after stroke

    Rising rural body-mass index is the main driver of the global obesity epidemic in adults

    Get PDF
    Body-mass index (BMI) has increased steadily in most countries in parallel with a rise in the proportion of the population who live in cities. This has led to a widely reported view that urbanization is one of the most important drivers of the global rise in obesity. Here we use 2,009 population-based studies, with measurements of height and weight in more than 112 million adults, to report national, regional and global trends in mean BMI segregated by place of residence (a rural or urban area) from 1985 to 2017. We show that, contrary to the dominant paradigm, more than 55% of the global rise in mean BMI from 1985 to 2017—and more than 80% in some low- and middle-income regions—was due to increases in BMI in rural areas. This large contribution stems from the fact that, with the exception of women in sub-Saharan Africa, BMI is increasing at the same rate or faster in rural areas than in cities in low- and middle-income regions. These trends have in turn resulted in a closing—and in some countries reversal—of the gap in BMI between urban and rural areas in low- and middle-income countries, especially for women. In high-income and industrialized countries, we noted a persistently higher rural BMI, especially for women. There is an urgent need for an integrated approach to rural nutrition that enhances financial and physical access to healthy foods, to avoid replacing the rural undernutrition disadvantage in poor countries with a more general malnutrition disadvantage that entails excessive consumption of low-quality calories
    corecore